Efficient parameter estimation for ODE models of cellular processes using semi-quantitative data

Abstract Motivation Quantitative dynamical models facilitate the understanding of biological processes and the prediction of their dynamics. The parameters of these models are commonly estimated from experimental data. Yet, experimental data generated from different techniques do not provide direct information about the state of the system but a nonlinear (monotonic) transformation of it. For such semi-quantitative data, when this transformation is unknown, it is not apparent how the model simulations and the experimental data can be compared. Results We propose a versatile spline-based approach for the integration of a broad spectrum of semi-quantitative data into parameter estimation. We derive analytical formulas for the gradients of the hierarchical objective function and show that this substantially increases the estimation efficiency. Subsequently, we demonstrate that the method allows for the reliable discovery of unknown measurement transformations. Furthermore, we show that this approach can significantly improve the parameter inference based on semi-quantitative data in comparison to available methods. Availability and implementation Modelers can easily apply our method by using our implementation in the open-source Python Parameter EStimation TOolbox (pyPESTO) available at https://github.com/ICB-DCM/pyPESTO.


Derivation of analytical gradient formulas
Here, we derive the formulas for the calculation of the analytical gradient of the objective function with respect to the mechanistic parameters in hierarchical optimization.

Model and spline definition
We consider models based on a system of ODEs ẋ(t, θ) = f (x(t, θ), θ), x(t 0 , θ) = x 0 (θ) (1) in which f : R nx × R n θ → R nx is a vector field describing the temporal evolution of the state variables x ∈ R nx , and θ ∈ R n θ are the unknown mechanistic parameters.The measured properties of the model are its observables y := h(x, θ) ∈ R ny .The dimensionalities of the state, parameter, and observable vector are denoted by n x , n θ and n y , respectively.To simplify the following computations, we assume that our model has only one observable, that is, n y = 1.Later in Subsection 1.5, we show that this can be done without loss of generality.Additionally, let this observable be a non-linear semi-quantitative observable.In other words, there exists a true and unknown non-linear monotone mapping g : R → R such that, assuming an additive normally distributed noise model, the measured data {z k } nt k=1 are given by: zk = g(h(x(t k , θ), θ)) for time points {t k } nt k=1 , in which n t is the number of observable time-points and σ = (σ k ) nt k=1 are the noise parameters.Noise parameters are also unknown in some cases and have to be estimated.In the case of such an observable, we refer to g(h(x(t k , θ), θ)) as the observable values, and to y k (θ) = h(x(t k , θ) as the (biochemical, epidemiological, ...) quantities of interest.We approximated this non-linear mapping using a piecewise linear spline s : R × R n ξ → R, where n ξ is the number of spline knots.The spline is given by s(y, ξ) := in which ξ are the differences between the heights of neighboring spline knots (Figure 1).In other words, the height of the i -th spline knot is equal to i j=1 ξ j .With (c j ) n ξ j=1 , we denote the uniformly distributed bases of the spline knots, that is, c j = c 1 + (j − 1) • ∆ c .In application, the true range of the biochemical or epidemiological quantities of interest is generally not known a priori.Thus, it is impossible to correctly fix the spline interval [c 1 , c n ξ ].However, since we will use iterative optimization methods, at each optimization iteration, we will have access to the simulated quantities of interest (y k ) nt k=1 (θ).Therefore, to avoid this problem, we always scale the spline interval to the current model output interval min k y k (θ), max k y k (θ) .This makes the spline parameter bases dependent on the quantities of interest.
However, this does not cause any issues other than slightly increasing the complexity of the analytical gradient computation.

Objective function and the hierarchical optimization problem
The spline links the measured quantities to the quantities of interest of the non-linear measured observable and thus yields the negative log-likelihood function for a dataset D = {(t k , zk )} nt k=1 and simulated observables y k = h(x(t k , θ), θ), for k = 1, . . ., n t .Here, (n(k)) nt k=1 denote subinterval indices for simulated quantities of interest, i.e.
] for k = 1, . . ., n t .We note that currently the objective function depends on the mechanistic parameters θ only through simulations y k (θ).The objective function is optimized hierarchically in three nested levels: in which the mechanistic parameters are optimized in the outer optimization problem (6), whereas the spline parameters and noise parameters are optimized in two nested inner optimization problems (7).The true measurement mapping is assumed to be monotone, so the spline parameters are constrained to be positive.This is sufficient even in the case of a decreasing true measurement mapping, as inverting the sign of the quantities of interest will turn it into an increasing one.The main benefit of the hierarchical optimization framework is the simplicity of the inner problems.
The inner optimization problem for noise parameters σ can be solved analytically, yielding in which by I ⊂ {1, . . ., n t } we denote the subset of indices that share the noise parameter σ I .The σ non-negativity constraint is satisfied as this analytical result already provides a non-negative value.
The inner problem for the spline parameters cannot be solved analytically, but we prove that it is convex in the following theorem.Therefore, its numerical optimization is computationally efficient.
We claim that the defined objective function is convex.We will prove this using the following theorem on the convexity of any least-squares optimization problem with non-negativity constraints.
Theorem 1.Let an objective function L ∈ C 2 (R n ξ , R) be of the form where Proof.We show that the Hessian of this function is positive semi-definite.The gradient of the objective function ( 9) is Hence, the Hessian matrix of the objective function at the point ξ is given by the Jacobian of the gradient This is a positive semi-definite matrix since, for any vector u ∈ R n ξ , the following holds: where (Y u) i is the i-th element of the vector Y u.In combination with affine inequality constraints, this implies that the optimization problem is convex.
The negative log-likelihood function defined in (5) can be easily reformulated into the form (9), where Z, Y , and b can depend on θ and σ but not on ξ.Thus, the inner optimization problem for the spline parameters is convex.

Analytical gradient
Due to the dependency of the optimal spline parameters ξ * and optimal noise parameters σ * on the mechanistic parameters θ, the derivative of the objective function with respect to a mechanistic parameter θ i contains two additional terms where we refer to ∂y k ∂θ i as the observable sensitivity at time point t k with respect to the parameter θ i .The first gradient term can be calculated using forward sensitivity analysis (FSA) or adjoint sensitivity analysis (ASA).Furthermore, the inner problem with respect to the noise parameters is solved exactly.Therefore, the second term ( 15) is always 0. This leaves the third term to be obtained.In the following theorem, we prove that it is always equal to 0 as well.The theorem is equivalent to the envelope theorem, which is used mainly in economic theory [Carter, 2001]. where ) is a convex objective function and ϕ is constrained to be positive.
Then for all θ ∈ R n θ , i = 1, . . ., n θ , and Proof.The pair (η * (θ), ϕ * (θ)) is a solution to a convex optimization problem with affine inequality constraints.Therefore, Slater's condition holds [Slater, 1959], and there exist optimal Lagrange multipliers µ * (θ) = (µ * i (θ)) ϕ * j ≥ 0, for j = 1, . . ., n ϕ µ * j ≥ 0, for j = 1, . . ., n ϕ .The first KKT condition (20) directly proves the first statement of the Theorem (18).Furthermore, due to the simple positivity constraints, the second KKT condition (21) states that the Lagrange multipliers are equal to the objective function gradient with respect to ϕ To prove the first statement of the theorem (18) for an index j ∈ {1, . . ., n ϕ }, we consider two cases: ϕ * j > 0 and ϕ * j = 0. Case 1: ϕ * j > 0. From the third KKT condition (22), we then know that µ * j (θ) = 0. Therefore, the statement follows from (23).Case 2: ϕ * j = 0.For an index i ∈ {1, . . ., n θ }, we differentiate the third KKT condition (22) with respect to θ i to obtain Since the negative log-likelihood function defined in ( 5) is convex it satisfies the theorem's conditions.In addition, in this case, η = σ and ϕ = ξ.Therefore, the derivative of the objective function with respect to a mechanistic parameter θ i can be analytically computed and is equal to 1.4 Spline regularization In our implementation, the spline (3) is additionally regularized to reduce overfitting.The regularization penalizes the non-linearity of the spline with the aim of minimizing false-positive non-linear measurement mappings.To achieve this, we add the following regularization term to the objective function: where In other words, α * (ξ) and β * (ξ) are the optimal scaling and offset of a linear regression of the spline knots {(c j , ξ j )} n ξ j=1 and the regularization is a penalization of the distance between the spline knots and the optimal linear line.Having a negative offset is not plausible in most applications.Thus, we constrain the offset to be non-negative.
We will show this optimization problem can be solved analytically.To do so, we state and prove the following simple proposition.
Proposition.Let (ᾱ, β) be the unique solution of an unconstrained optimization problem where L : R 2 → R is a strictly convex objective function.Then the unique solution (α * , β * ) of the optimization problem with a positivity constraint s.t.β ≥ 0 satisfies the following: In other words, we can solve for the positivity-constrained variable by simply solving the unconstrained problem and inspecting whether it is positive.If it is, then the solution is given by it.Otherwise, the optimal solution is on the constraint, i.e. 0. Proof.The first statement of the proposition is trivial.If the solution of the unconstrained problem lies in the feasible region of the constrained problem, then the solutions (ᾱ, β) and (α * , β * ) are equal due to the same strictly convex objective function of the optimization.To prove the second statement, let us assume the opposite: β < 0, but β * > 0. Then we connect the two solutions by a line in the two-dimensional parameter space l : [0, 1] → R 2 from (α * , β * ) to (ᾱ, β): l(x) = (1 − x) • (α * , β * ) + x • (ᾱ, β).Since these points are on opposite sides of the constraint, the line will cross the constraint at some point (α, β).This point also satisfies J(α * , β * ) < J(α, β) because it lies in the feasible parameter space and is not equal to the optimal point (α * , β * ) due to being on the constraint β * > 0 = β.However, from the solution of the unconstrained problem, we know that J(α, β) > J(ᾱ, β).In conclusion, we have found a line l with three points on it that satisfy J(α * , β * ) < J(α, β) > J(ᾱ, β).This is in contradiction to the strict convexity of the function J and therefore the second statement holds as well.Now we will apply the proposition to our problem.The optimization problem (26) satisfies the conditions of Theorem (1), so it is convex.Furthermore, it is strictly convex due to the full rank of the corresponding Hessian matrix Y T Y , which makes the inequality (13) strict.Thus, it satisfies the conditions of the proposition.To obtain the solution of the unconstrained version of the problem, we equate the gradient of the objective function to 0. This gives a linear system whose solution is According to the proposition, to obtain the solution to the unconstrained problem we inspect the sign of β(ξ).If it is positive, the solution is equal to (ᾱ(ξ), β(ξ)).Otherwise, the optimal β * (ξ) is equal to 0. To obtain the optimal scaling α * (ξ) for this case we turn to the KKT conditions of the constrained optimization problem: where µ * ∈ R is the optimal Lagrange multiplier.The first KKT condition ( 30) is a linear equation in α * (ξ) whose solution for β * (ξ) = 0 is Pulling the two cases together, we can write the final analytical solution to the optimization problem as This regularization is appended to the objective function of the model optimization (5).Thus, the complete objective function is given as: where λ ∈ R + is the regularization strength parameter.We note that this addition does not change the result of the analytical gradient (Theorem 2), as the objective function remains convex.

Models with multiple observables
Here, we show that the results obtained in previous subsections hold for a model with an arbitrary number of semi-quantitative observables n y .The semi-quantitative data is linked to the observables where each semi-quantitative observable has its own spline function s i : R × R n ξ,i → R. We assume that each one of the splines depends on its own vector of spline parameters ξ i and that these parameters are not shared among observables.Similarly, we assume that the observable noise parameters σ i are not shared between observables.Thus, the overall objective function J consists of n y observable-specific objective functions {J i : R n θ ×nt×n ξ,i → R} where is the data set of the i-th observable.Therefore, the hierarchical optimization problem contains n y separate inner sub-problems that can be solved as before.
To efficiently solve the outer optimization problem, we need to calculate the gradient of the objective function J with respect to the mechanistic parameters θ.Since the objective function is additive in observables, this is given by: where ∇ θ J i (θ, σ * i , ξ * i ) can be obtained as before.

Model details
For the evaluation of the proposed method, we employed five models in total: one toy model T1 and four published models M1-M4.The models contain a varying number of states, mechanistic parameters, observables, and data points (Table 1).The published models are taken from a collection of parameter estimation problems in PEtab format, which is based on the benchmark collection by Hass et al. [2019].
These models were originally calibrated on quantitative measurements.Thus, to benchmark the proposed method, we modified the observation model to include non-linear measurement mappings.However, as the original model measurements were already noise-corrupt, it is not possible to retain the same noise model by simply applying the non-linear mappings to those measurements.Therefore, we generated synthetic noise-free data and transformed it with the non-linear mappings.Only then could we add noise of the same original magnitude.This resulted in the same noise model as in the original model versions.The mechanistic parameters used to generate the synthetic data were estimated using the original quantitative measurements.These nominal mechanistic parameters are available in the benchmark collection mentioned above.Furthermore, the model's only observable is the ratiometric imaging intensity ratio

Model T1: FRET probe activation
Consistent with the notation established in Section 1, the observation function of this observable is the simple identity function h(P * ) = P * , whereas the non-linear measurement mapping g is given by R.
To obtain synthetic data, we simulated the model with the true values of mechanistic parameters from Table 2, applied the measurement mapping function (45), and added additive normal noise with standard deviation given in the same Table .Three model variants with varying observable models were estimated.The first is the parameterized model.Its observation model was a parameterization of the true measurement mapping (45).To avoid non-identifiability of observable parameters, instead of directly estimating f AA , f AD and f DD , we rather estimated a parameterization with only two observable parameters α and β g par (P * ) = α • P * (P T OT − P * ) + β.
The second model estimated the measurement mapping using a linear function.Lastly, the third model used the proposed spline estimation method.The chosen non-linear measurement mappings are denoted in dashed blue for each observable (A-C).

Model M1: STAT5 dimerization
The STAT5 dimerization model was introduced by Boehm et al. [2014].It possesses 3 observables -pSTAT5A rel, pSTAT5B rel, and rSTAT5A rel.We will refer to the observables as first, second, and third in that order.The chosen synthetic non-linear measurement mappings are shown in Figure 3: saturation, sigmoid, and censoring.In the same order, their expressions are where the specific numerical values were chosen such that the mappings are significantly non-linear in the range of the respective simulated biochemical quantities.
algorithm [Jones et al., 2001].Both optimizers were accessed through the pyPESTO interface with the default optimizer settings.For optimization of inner problems, we used SciPy's L-BFGS-B algorithm with default settings.For ODE integration, we used the AMICI Python toolbox [Fröhlich et al., 2021].

Parameter inference study
In the last subsection of the Results section in the manuscript, we presented a study focused on parameter inference.We compared the linear estimation and spline estimation approaches in the four application examples M1 -M4 with their respective synthetic non-linear measurement mappings.Furthermore, we evaluated the impact of an increasing number of unknown measurement mappings.
For illustration, we consider the model M1.It contains three observables, each with its own non-linear measurement mapping (Fig. 3).However, since these mappings are synthetic and thus known, we were able to include the mappings {g i } 3 i=1 in the observable function {h i } 3 i=1 , transforming the semi-quantitative observables into standard quantitative ones.This is how we obtained model variants of the same initial model M1 with a variable number of unknown measurement mappings.Furthermore, to remove any observable bias, we considered all combinations of unknown and known observables (Table 3).Table 3: Observable combinations of model M1. and denote whether the measurement mapping of the observable is known or unknown, respectively.
For models M2 and M3 this is simple, as they contain only one observable so we considered two variants: known observable mapping and unknown observable mapping.Model M4 contains 8 observables, so it would have been computationally too expensive to include all 2 8 = 256 model variants with 3 estimation approaches.Therefore, to reduce the observable bias while keeping the number of model variants reasonably low, we considered the model M4 variants shown in Table 4. Table 4: Observable combinations of model M4. and denote whether the measurement mapping of the observable is known or unknown, respectively.

Figure 1 :
Figure 1: Illustration of the spline definition notation.

Figure 3 :
Figure 3: Synthetic non-linear semi-quantitative data of model M1.The simulated quantities of interest are denoted on the x-axis in green, whereas the semi-quantitative measurements are denoted on the y-axis in blue.The chosen non-linear measurement mappings are denoted in dashed blue for each observable (A-C).

Table 1 :
Model n x n θ n y |D| Application models.Consistent with notation from Section 1, n x , n θ , and n y denote the numbers of state variables, mechanistic parameters, and observables, respectively.With |D| we denote the cardinality of the data set.
Birtwistle et al. [2011]e model of FRET probe activation introduced byBirtwistle et al. [2011].The model assumes Michaelis-Menten (MM) mechanisms for both probe activation and deactivation.All mechanistic, noise, and observable parameters are presented in Table2.The model consists of one differential equation and one conservation law:

Table 2 :
Toy model T1 parameters.The horizontal lines separate mechanistic, noise, and observable parameters, respectively.